54 research outputs found

    Incorporating In-Source Fragments Improves Metabolite Identification Accuracy in Untargeted LCMS and LCMS/MS Datasets.

    Get PDF
    In untargeted metabolomics experiments library search engines detect metabolites using several features, including precursor mass, isotopic distribution, retention time, and MS2 fragmentation. Matching acquired MS2 to library spectra is vital as numerous compounds share molecular formulas, resulting in identical precursor measurements and similar retention times. However, many metabolomics experiments are still collected using LC-MS only, and even in LC-MS/MS experiments many precursors lack MS2 spectra due to the stochastic nature of data dependent acquisition. We observe that when metabolites ionize they can produce unanticipated MS1 features resulting from neutral losses, in-source fragmentation, multimerization, and adducts. Here we present a new approach to leverage these measurements to identify metabolites when MS2 spectra are of low quality or not available. We processing datasets of 75 known standards mixed with whole yeast lysates to strip them of their MS2 scans to produce a gold-standard MS1-only data set of a complex metabolome with known targets. For each dataset we determined the proportion unambiguous annotations (where the correct annotation had a higher score than other potential annotations) and unmistakable annotations (where the correct annotation was the only valid annotation detected). We found that incorporating in-source fragments improved these metrics for both MS1-only (increasing from 60% to 73% unambiguous and 40% to 65% unmistakable matches) and MS2 datasets (from 79% to 84% unambiguous and 41% to 60% unmistakable). Unexpectedly, in these data we observed that the MS2 spectra were less useful than in-source fragment data for improving identification accuracy. We believe this is largely because the low-resolution iontrap MS2 spectra collected in this experiment show significant noise, which diminishes spectral match scores and allows other candidates to outscore the correct identifications. We suspect that noise is less likely to affect MS1 peak groups because they are generated from data aggregated across multiple high-resolution MS1 scans

    A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

    Full text link
    Abstract Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/ .http://deepblue.lib.umich.edu/bitstream/2027.42/112965/1/12859_2012_Article_5570.pd

    Gene Gangs of the Chloroviruses: Conserved Clusters of Collinear Monocistronic Genes

    Get PDF
    Chloroviruses (family Phycodnaviridae) are dsDNA viruses found throughout the world’s inland waters. The open reading frames in the genomes of 41 sequenced chloroviruses (330 + 40 kbp each) representing three virus types were analyzed for evidence of evolutionarily conserved local genomic “contexts”, the organization of biological information into units of a scale larger than a gene. Despite a general loss of synteny between virus types, we informatically detected a highly conserved genomic context defined by groups of three or more genes that we have termed “gene gangs”. Unlike previously described local genomic contexts, the definition of gene gangs requires only that member genes be consistently co-localized and are not constrained by strand, regulatory sites, or intervening sequences (and therefore represent a new type of conserved structural genomic element). An analysis of functional annotations and transcriptomic data suggests that some of the gene gangs may organize genes involved in specific biochemical processes, but that this organization does not involve their coordinated expression

    Observing Strategies for Focused Orbital Debris Surveys Using the Magellan Telescope

    Get PDF
    A breakup of the Titan 3C-17 Transtage rocket body was reported to have occurred on June 4th, 2014 at 02:38 UT by the Space Surveillance Network (SSN). Five objects were associated with this breakup and this is the fourth breakup known for this class of object. There are likely many more objects associated with this event that are not within the Space Surveillance Network's ability to detect and have not been catalogued. Several months after the breakup, observing time was obtained on the Magellan Baade 6.5 meter telescope to be used for observations of geosynchronous (GEO) space debris targets. Using the NASA Standard Satellite Breakup Model (SSBM), a simulated debris cloud of the recent Transtage breakup was produced and propagated forward in time. This provided right ascension, declination, and tracking rate predictions for where debris associated with this breakup may be more likely to be found in the sky over Magellan for our observing run. Magellan observations were then optimized using the angles and tracking rates from the model predictions to focus the search for Transtage debris. Data were collected and analysed and preliminary comparisons made between the number of objects detected and the number expected from the model. We present our results here

    Gene Gangs of the Chloroviruses: Conserved Clusters of Collinear Monocistronic Genes

    No full text
    Chloroviruses (family Phycodnaviridae) are dsDNA viruses found throughout the world’s inland waters. The open reading frames in the genomes of 41 sequenced chloroviruses (330 ± 40 kbp each) representing three virus types were analyzed for evidence of evolutionarily conserved local genomic “contexts”, the organization of biological information into units of a scale larger than a gene. Despite a general loss of synteny between virus types, we informatically detected a highly conserved genomic context defined by groups of three or more genes that we have termed “gene gangs”. Unlike previously described local genomic contexts, the definition of gene gangs requires only that member genes be consistently co-localized and are not constrained by strand, regulatory sites, or intervening sequences (and therefore represent a new type of conserved structural genomic element). An analysis of functional annotations and transcriptomic data suggests that some of the gene gangs may organize genes involved in specific biochemical processes, but that this organization does not involve their coordinated expression

    The Exploration of Novel Regulatory Relationships Drives Haloarchaeal Operon-Like Structural Dynamics over Short Evolutionary Distances.

    No full text
    Operons are a dominant feature of bacterial and archaeal genome organization. Numerous investigations have related aspects of operon structure to operon function, making operons exemplars for studies aimed at deciphering Nature’s design principles for genomic organization at a local scale. We consider this understanding to be both fundamentally important and ultimately useful in the de novo design of increasingly complex synthetic circuits. Here we analyze the evolution of the genomic context of operon-like structures in a set of 76 sequenced and annotated species of halophilic archaea. The phylogenetic depth and breadth of this dataset allows insight into changes in operon-like structures over shorter evolutionary time scales than have been studied in previous cross-species analysis of operon evolution. Our analysis, implemented in the updated software package JContextExplorer finds that operon-like context as measured by changes in structure frequently differs from a sequence divergence model of whole-species phylogeny and that changes seem to be dominated by the exploration of novel regulatory relationships

    A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

    Get PDF
    Abstract Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/
    • …
    corecore